21 research outputs found
Genetic automated machine learning assistant
GAMA is an AutoML package for end-users and AutoML researchers. It uses genetic programming to efficiently generate optimized machine learning pipelines given specific input data and resource constraints. A machine learning pipeline contains data preprocessing as well as a machine learning algorithm, with fine-tuned hyperparameter settings.
Document type: Articl
Meta-Learning for Symbolic Hyperparameter Defaults
Hyperparameter optimization in machine learning (ML) deals with the problem
of empirically learning an optimal algorithm configuration from data, usually
formulated as a black-box optimization problem. In this work, we propose a
zero-shot method to meta-learn symbolic default hyperparameter configurations
that are expressed in terms of the properties of the dataset. This enables a
much faster, but still data-dependent, configuration of the ML algorithm,
compared to standard hyperparameter optimization approaches. In the past,
symbolic and static default values have usually been obtained as hand-crafted
heuristics. We propose an approach of learning such symbolic configurations as
formulas of dataset properties from a large set of prior evaluations on
multiple datasets by optimizing over a grammar of expressions using an
evolutionary algorithm. We evaluate our method on surrogate empirical
performance models as well as on real data across 6 ML algorithms on more than
100 datasets and demonstrate that our method indeed finds viable symbolic
defaults.Comment: Pieter Gijsbers and Florian Pfisterer contributed equally to the
paper. V1: Two page GECCO poster paper accepted at GECCO 2021. V2: The
original full length paper (8 pages) with appendi
GAMA: genetic automated machine learning assistant
GAMA is an AutoML package for end-users and AutoML researchers. It uses genetic programming to efficiently generate optimized machine learning pipelines given specific input data and resource constraints. A machine learning pipeline contains data preprocessing as well as a machine learning algorithm, with fine-tuned hyperparameter settings
An open source AutoML benchmark
In recent years, an active field of research has developed around automated machine learning(AutoML). Unfortunately, comparing different AutoML systems is hard and often doneincorrectly. We introduce an open, ongoing, and extensible benchmark framework whichfollows best practices and avoids common mistakes. The framework is open-source, usespublic datasets and has a website with up-to-date results. We use the framework to conducta thorough comparison of 4 AutoML systems across 39 datasets and analyze the results
Visual exploration of migration patterns in gull data
We present a visual analytics approach to explore and analyze movement data as collected by ecologists interested in understanding migration. Migration is an important and intriguing process in animal ecology, which may be better understood through the study of tracks for individuals in their environmental context. Our approach enables ecologists to explore the spatio-temporal characteristics of such tracks interactively. It identifies and aggregates stopovers depending on a scale at which the data is visualized. Statistics of stopover sites and links between them are shown on a zoomable geographic map which allows to interactively explore directed sequences of stopovers from an origin to a destination. In addition, the spatio-temporal properties of the trajectories are visualized by means of a density plot on a geographic map and a calendar view. To evaluate our visual analytics approach, we applied it on a data set of 75 migrating gulls that were tracked over a period of 3 years. The evaluation by an expert user confirms that our approach supports ecologists in their analysis workflow by helping to identifying interesting stopover locations, environmental conditions or (groups of) individuals with characteristic migratory behavior, and allows therefore to focus on visual data analysis
OpenML-Python: an extensible Python API for OpenML
OpenML is an online platform for open science collaboration in machine learning, used to share datasets and results of machine learning experiments. In this paper, we introduce OpenML-Python, a client API for Python, which opens up the OpenML platform for a wide range of Python-based machine learning tools. It provides easy access to all datasets, tasks and experiments on OpenML from within Python. It also provides functionality to conduct machine learning experiments, upload the results to OpenML, and reproduce results which are stored on OpenML. Furthermore, it comes with a scikit-learn extension and an extension mechanism to easily integrate other machine learning libraries written in Python into the OpenML ecosystem. Source code and documentation are available at https://github.com/openml/openml-python/